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Abstract 

Cancers originating from epithelial cells are the most 
common malignancies. No common expression profile 
of solid tumors compared to normal tissues has been 
described so far. Therefore we were interested if genes 
differentially expressed in the majority of carcinomas 
could be identified using bioinformatic methods. 
Complete data sets were downloaded for carcinomas 
of the prostate, breast, lung, ovary, colon, pancreas, 
stomach, bladder, liver, and kidney, and were sub- 
jected to an expression analysis using SAM. In each 
experiment, a gene was scored as differentially ex- 
pressed if the q value was below 25%. Probe identifiers 
were unified by comparing the respective probe 
sequences to the Unigene build 155 using BlastN. To 
obtain differentially expressed genes within the set of 
analyzed carcinomas, the number of experiments in 
which differential expression was observed was 
counted. Differential expression was assigned to 
genes if they were differentially expressed in at least 
eight experiments of tumors from different origin. The 
identified candidate genes ADRM1, EBNA1BP2, FDPS, 
FOXM1, H2AFX, HDAC3, IRAKI, and YY1 were sub- 
jected to further validation. Using this comparative 
approach, 100 genes were identified as upregulated 
and 21 genes as downregulated in the carcinomas. 
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Introduction 

Solid tumors are the second most common cause of death 
in the western world. Besides very few successes in rare 
solid tumors such as testicular cancer, the survival rate of 
most of these tumors is still low. However, many of these 
tumors are characterized by inactivating mutations of com- 
mon tumor-suppressor genes, such as p53, p16, or Rb, and 
activation of oncogenes such as Her2/neu [1 ,2]. Therefore 
one might guess that not only these genes are differentially 
expressed within a broad range of tumors, but that there are 
also other undescribed genes that are differentially 
expressed in the majority of tumors. 

Large-scale gene expression analysis by means of 
microarrays has yielded large sets of class 2 cancer genes 



[3]. This enables us to compare the expression profiles of 
various tumors and to generate sets of common differentially 
expressed genes. These genes would then represent a pool of 
interesting candidates used to give new insights into tumor 
development, and are candidates for new targets for therapy 
and diagnostics in a variety of cancers. Attempts have been 
made to compare the gene expression profiles of tumors of 
a single entity and to assign differentially expressed genes 
[4-9]. In addition, gene expression differences of tumors 
from diverse organs have been analyzed with various 
approaches [10-20]. 

Gene expression profiles are obtained by different techni- 
ques such as DDRT-PCR, SAGE, expressed sequence tag 
(EST) sequencing, or microarrays [21,22]. Moreover, within the 
field of microarrays, there are at least three different rival 
technologies. The most often used techniques, Affymetrix 
GeneChips and cDNA on glass microarrays, generate different 
data types and are accompanied by their own challenges within 
data preprocessing, leading to sets of gene expression data 
that are not easy to compare [23]. 

To identify common differentially expressed genes in solid 
tumors and to overcome these limitations, we have analyzed 
the differential expression of genes within each experiment by 
a single method with defined thresholds for 1 1 different carci- 
nomas. We identified a set of 121 differentially expressed 
genes and validated eight of these using dot blots containing 
the transcriptome of different solid cancers. 



Materials and Methods 

Data Sets : 
Only data sets containing values for tumors and normal 
tissues were selected. Data sets for gene expression profiles 
were obtained by downloading the expression values in 
tabular form from the Stanford microarray database (http:// 
genome-www5.stanford.edu//), GEO (http://www.ncbi.nlm. 
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nih.gov/geo/), and the supplementary information of pub- 
lished manuscripts (Table 1). All data sets were used as 
provided and were normalized if needed by median center- 
ing. To unify the used gene identifiers, all probe sequences 
were compared against the Unigene database version 155 
using the BlastN program [24]. Sequences were assigned to 
a Unigene cluster if their homology was below an e value 
of e- 100, or if their score was higher than 1.5 per base 
analyzed. Data from the following tumors have been ac- 
quired: prostate, breast, lung, ovary, colon, pancreas, stom- 
ach, bladder, liver, and kidney. 

identification of Differentially Expressed Genes 

\ All experiments were analyzed using the SAM package 
(http://www-stat.stanford.edu/~tibs/SAM/index.html) with a 
cutoff of 25% as q value. To the probes identified by this 
approach, we assigned the value of 1 if they were overex- 
pressed in the tumor, or -1 if they were underexpressed. If a 
probe was not differentially expressed, we assigned 0; if it 
was not analyzed in a given experiment, we assigned a blank 
field. For all probes, we counted the number of differential 
expression. To identify genes differentially expressed in the 
majority of carcinomas, we called only those genes as 
differentially expressed, which displayed a differential ex- 
pression within eight of the analyzed 11 cancer entities 
(73%). For the cancer entities of the kidney, lung, and 
prostate, we obtained more than one data set. For these, 
differentially expressed genes were identified by each ex- 
periment as mentioned above. Subsequently, within a type of 
cancer (e.g., prostate cancer), the differentially expressed 
genes were compared and scored as differential only if they 
wire identified within at least 50% of the experiments. Thus, 
for, prostate cancer, genes were counted as differentially 
expressed if they had been identified within at least two 



Table 1. Used Gene Expression Profiling Data Sets. 



experiments—for kidney and lung cancer, if they had been 
identified within at least one experiment. Such genes were 
then counted only once for the comparison between the 
different cancer entities. 



Validation of Differential Expression 

CDNA clones for ADRM1 (accession no. BM913272), 
EBNA1BP2 (BU541488), FDPS (BQ877587), FOXM1 
(BQ691509), H2AFX (BG757479), HDAC3 (BM468317), 
IRAKI (BE250451), and YY1 (BE746736) were obtained 
from the RZPD (www.rzpd.de) and prepared using the 
GFX Micro Plasmid Prep kit according to the manufacturer's 
recommendation (Amersham Pharmacia Biotech, Freiburg, 
Germany). The identity of the inserts of the obtained clones 
was confirmed by sequencing of the clones with an ABI 3700 
sequencer and BigDye terminators (Applied Biosystems, 
Weiterstadt, Germany). For the derived sequences, we 
analyzed their identity against the clones using the program 
BlastN and the Unigene database. The inserts of the clones 
were prepared by digesting the DNA with the appropriate 
restriction enzymes and purification from agarose gels. The 
fragments were labelled with a- 32 P-dCTP ( - 6.000 Ci/mmol) 
using the Rediprime II kit (Amersham Pharmacia Biotech) 
and hybridized to a cancer profiling array II (CPA II) using 
the ExpressHyb hybridization solution according to the 
manufacturer's recommendations (Clontech, Heidelberg, 
Germany). 

Autoradiographs were obtained by using a BAS-II scan- 
ner (Raytest, Eggenfelden, Germany). For further analysis, 
intensities were normalized to (3-actin and divided by the 
normalized intensities of the corresponding normal tissues. 
These data were subjected to visualization using Treeview 
(http://rana.lbl.gov/EisenSoftware.htm). 



Reference 


Tissue 


Type of tumor 


microarray 


Number of tumor 


Number of 
normal 


Number of 
probes/probesets 


Source of 


Dyrskjot et al [30] 


Bladder 


Carcinoma 


Affymetrix 






8,793 


GEO 


Sorlie et al. [19] 




Carcinoma 


cDNA 


68 




8,102 


SMD 


Notterman et al. [18] 




Adenocarcinoma 


Affymetrix 




18 


7,129 


Supplement 


Boeretal. [11] 


Kidney 


Renal cell carcinoma 


cDNA 


32 


32 


31,500 


GEO 


Hlggins et al. [15] 


Kidney 


Renal cell carcinoma 


cDNA 


24 


3 


22,648 


SMD 


Chen et al. [12] 


Liver 


Hepatocellular 
carcinoma 


cDNA 


82 


75 


23,075 


SMD 


Gafcer et al. [14] 




Adenocarcinoma 


cDNA 


36 (A) and 


5 


8,102 


SMD 






and squamous cell 




13 (S) 












carcinoma 












Bhattacharjee et al. [10] 


Lung 


Adenocarcinoma 


Affymetrix 


127 (A) and 


17 


12,625 


Supplement 






and squamous eel! 




21 (S) 








Weish et al. [20] 


Ovary 




Affymetrix 


27 


5 


8,793 


Supplement 


lacobuzlo-Donahue et al. [16] 


Pancreas 




cDNA 


11 


5 


44,500 


SMD 


Dhanasekaran et al. [13] 




Adenocarcinoma 


cDNA 


10 


9 


9,948 


Supplement 


Lubetal. [31] 


Prostate 




cDNA 


16 


9 


6,500 


Supplement 


Singh et al. [32] 




Adenocarcinoma 


Affymetrix 


52 


50 


12,625 


Supplement 


We'lsh et al. [33] 




Adenocarcinoma 


Affymetrix 


25 


9 


8,793 


Supplement 


Leung etal. [17] 


Stomach 


Adenocarcinoma 


cDNA 


89 


23 


44,500 


SMD 
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Overlap of Data Sets 

A prerequisite for a comparative analysis of gene expres- 
sion differences is that a high number of genes are analyzed 
in more than one experiment. This is especially true for 
comparisons using microarray data sets with incomplete 
genome representation. However, within the data sets used, 
the majority of genes is analyzed by at least six experiments, 
and more than 4000 genes are interrogated by at least eight 
experiments (Figure 1/4). 

Identification of Differential Expression 

; Assignment of significance values is a common proce- 
dure in the analysis of gene expression. There are several 
algorithms known, but the obtained values are rarely cor- 
rected for multiple testing. To control the error associated 
w|h multiple testing, we used the software SAM (http://www- 
stat.stanford.edu/-tibs/SAM/index.html). We tested several 
cutoffs for the q value generated by SAM in a comparison of 
foiiir different prostate experiments. From these results, we 
concluded that a cutoff of 25% would be sensitive enough to 
ge'nerate a list of common differentially expressed genes 
because we found an overlap of 50% with the data set 
generated by Rhodes et al. [25] (data not shown). We also 
trif d to use only those genes with a cutoff below 5%, but 
failed to identify differentially expressed genes in several 
studies, probably due to a high data variance/which could in 
patt be attributed to a low sample number of normal tissues 
indifferent studies (data not shown). When we analyzed the 
frequency of classification of a gene as differentially 
expressed, we observed that most of the genes were never 
classified as differentially expressed and only a minor num- 
ber were classified more than five times (Figure 1 , B and C). 
Because we were interested to further analyze genes differ- 
entially expressed in the majority of investigated tumors, we 
chbse to examine genes differentially expressed in at least 
eight tumors. Using this threshold, we identified 1 00 genes 
as|commonly upregulated and 21 as commonly downregu- 
lafed in 1 1 different cancer types from 1 0 different tissues, as 



listed in Figure 2A Within this set, we identified genes such 
as PCNA and OSF-2, which are well known to be overex- 
pressed within human carcinomas. We then classified the 
differentially expressed genes according to their purposed 
function (www.geneontology.org; Figure 3). However, for 
most of the identified genes, a function has not assigned 
yet. Of these genes that were characterized by a category, 
most of them were grouped into the category of metabo- 
lism and cell growth/maintenance. Differences in the distri- 
bution of the assigned categories between normal and 
tumor genes could not be observed. Annotation of the 
genes revealed that many of the genes are either transcrip- 
tional regulators, or take part in the degradation of proteins 
(Figure 2A). 

Validation of Differential Expression 

We selected eight upregulated genes based on their 
degree of differential expression and proposed function 
(ADRM1, EBNA1BP2, FDPS, FOXM1, H2AFX, HDAC3, 
IRAKI, and YY1) and investigated their differential expres- 
sion by hybridization of a gene-specific probe to a CPA II. 
The CPA II filter contains the transcriptome from 1 9 different 
tumor entities. Each tumor is represented by more than one 
sample, and a corresponding sample of normal tissue from 
the same donor is provided. For each of the spots on the 
array, we also obtained the expression values for (3-actin as a 
"housekeeping" control. To evaluate the differential expres- 
sion of the genes, we divided each value by the corres- 
ponding spot value of p-actin. Furthermore, we divided the 
normalized values of the tumor by the corresponding normal 
values (Figure 26). As expected, most of these genes were 
overexpressed in a number of samples of different tumor 
entities displaying the validity of our approach, though we 
could not observe any upregulation of the genes within the 
kidney tissues analyzed. Also, only sporadic upregulation 
was seen in the mammary tumor samples. This might argue 
against our approach although, except for FOXM1, none of 
the genes was identified as overexpressed in kidney cancer 
by our comparative analysis and only five (ADRM1, 
EBNA1BP2, FDPS, FOXM1, and H2AFX) were identified in 
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Figure 1 . (A) Histogram of the number of genes analyzed by the different experiments. (B and C) Histogram of the number of genes found to be differentially 
expressed in the different experiments: (B) overexpressed; (C) underexpressed. Interestingly, we did not find a gene that is underexpressed in more than nine 
experiments, indicating a more heterogeneous gene expression in normal tissues of different origins. X-axis: number of experiments; Y-axis: number of genes. 
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Molecular function of differentially expressed genes 
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Figure 3. Grouping of identified genes into the molecular function categories of gene ontology using Fatigo (http://fatigo.bioinfo.cnio.es/). Red: Genes 
overexpressed In tumors; green genes underexpressed In tumors. 



breast cancer. Also, we were not able to analyze the quality 
of the samples in detail. We observed also the heterogeneity 
of; gene expression in the different tumor entities. In our 
comparative analysis, FOXM1 was overexpressed in 1 0 of 
1 1 different tumors. This is reflected by the high overexpres- 
sion of FOXM1 in nearly every tumor analyzed. However, 
overexpression of FOXM1 could not be observed for some 
samples of different origins. 



Discussion 

In an attempt to identify genes commonly overexpressed 
within solid tumors, we have compared gene expression 
profiling experiments of 1 1 different tumors from 10 different 
organs. We identified the differentially expressed genes 
using a comparative analysis. The major challenge herein 
was to unify the different identifiers of the experiments. The 
data sets used were derived from different microarrays plat- 
forms with different structures of identifiers. Whereas Affy- 
metrix GeneChip provides a probeset identifier with a target 
sequence, the obtained data sets from cDNA microarrays 
provide only an accession number of the used clones. This 
leads to the need for a large computational effort. Unfortu- 
nately, most of the gene expression data have not been 
published as raw data so far. Therefore, published articles 
are the prevalent repository for differentially expressed 
genes. Comparison of these data has its own merit [26], 
but only analysis of the raw data circumvents the inherent 
problems of microarrays within the field of data normalization 
and expression level assignment. Moreover, comparison of 
published data is hampered by different modes of gene 
expression analysis, possibly leading to a subjective choice 
of genes. 



Within this set of analyzed genes, most of them have 
never been assigned as differentially expressed. Also, only 
a few genes are identified as differentially expressed in 
more than five experiments (Figure 1, B and Q. From 
those data, we might conclude that a comparison of micro- 
array data leads hot to an accumulation of false positives 
but to subsets of genes that might be worthwhile in further 
analyses. ' 

We were interested in those whose expression is 
deregulated in a majority of tumors. Therefore, we decided 
to concentrate on genes that are differentially expressed 
in 8 of 11 analyzed tumors. We identified 100 genes as 
commonly overexpressed and 21 as commonly downregu- 
lated within carcinomas investigated. Interestingly, we were 
able to identify more genes as commonly overexpressed 
than genes commonly underexpressed in tumors. This 
might be due to a higher variance of the observed gene 
expression in normal tissues. This variance might result 
from an inadequate low number of these tissues in indi- 
vidual comparisons (i.e., there were only 4 normal breast 
tissues and 68 tumor tissues analyzed by Sorlie et al. [19]) 
(Table 1). As normal tissue has to abide by the precise 
function of an organ, which is different between organs, a 
tumor has, first of all, the need to grow. To accomplish this 
task, a tumor has to circumvent several safeguard func- 
tions of cell maintenance and growth limitations [1]. To 
bypass these, only a few genes have to be modified, which 
might lead to a common tumor phenotype. Therefore, gene 
expression of tumors of different organs might probably be 
more homogenous than the gene expression of normal 
tissues from the same organs. 

Within the set of overexpressed genes, we identified 
genes from different functional categories. Interestingly, 
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gdnes from the protein degradation pathway by the ubiquitin 
ligation/proteasome pathway as well as transcription factors 
were often found. Those genes have already been de- 
scribed as overexpressed within tumors [1,27], therefore 
demonstrating the feasibility of our approach. In addition, 
these genes represent key members for new targets for 
cancer therapy. Validation of differential expression is a key 
for successful microarray experiments. This should also 
refer to comparative or meta-analysis. However, tissue 
samples of different tumor entities with an adequate number 
of samples can rarely be obtained. We validated the differ- 
ential expression of 8 of 100 overexpressed genes and 
observed an overexpression in most cases. That not all 
genes were overexpressed in all samples might be attrib- 
uted to either the heterogeneity of gene expression in 
tumors or perhaps errors of the sampling of tissues, or that 
the gene was falsely assigned as differentially expressed by 
otir method. 

•Within the group of overexpressed genes, we identified 
FOXM1 and ADAR1. FOXM1 is activated by the hedgehog 
signalling pathway [28], indicating a common activation of 
this elemental developmental pathway in cancers. ADAR is a 
member of a diverse family of proteins involved in the editing 
of.mRNA. Overexpression of this gene might lead to a higher 
amount of edited RNA and might therefore lead to mutated 
proteins. A high level of ADAR within cancers therefore might 
aet as a mutator mutation. That enzymes of the RNA editing 
pathway are capable of acting in this manner has been 
shown for APOBEC [29]. 

| In conclusion, using a novel approach to compare gene 
expression data, we identified a set of genes that might be 
useful in the further analysis of fundamental signal transduc- 
tion pathways that lead to carcinomas. 
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